Goto

Collaborating Authors

 ls 0



Supplementary material for'Spike and slab variational Bayes for high dimensional logistic regression '

Neural Information Processing Systems

(Section 11). Lemma 2. Suppose the prior satisfies Lemma 3. Suppose the prior satisfies Lemma 4. Suppose the prior satisfies This is the most difficult technical step in establishing our result. Lemma 5. Consider the event We briefly explain the heuristic idea behind the proof of Lemma 5. Since the VB posterior We also work with the default parametrization of the bmlasso function of the BhGLM package, i.e. We provide five further test cases in addition to the experiment considered in Section 5. In all cases we consider Gaussian design matrices, but vary all other parameter. We ran each experiment 200 times and report the means and standard deviations of the performance metrics in Table 3.


Logarithmic Smoothing for Adaptive PAC-Bayesian Off-Policy Learning

Haddouche, Maxime, Sakhi, Otmane

arXiv.org Machine Learning

Off-policy learning serves as the primary framework for learning optimal policies from logged interactions collected under a static behavior policy. In this work, we investigate the more practical and flexible setting of adaptive off-policy learning, where policies are iteratively refined and re-deployed to collect higher-quality data. Building on the success of PAC-Bayesian learning with Logarithmic Smoothing (LS) in static settings, we extend this framework to the adaptive scenario using tools from online PAC-Bayesian theory. Furthermore, we demonstrate that a principled adjustment to the LS estimator naturally accommodates multiple rounds of deployment and yields faster convergence rates under mild conditions. Our method matches the performance of leading offline approaches in static settings, and significantly outperforms them when intermediate policy deployments are allowed. Empirical evaluations across diverse scenarios highlight both the advantages of adaptive data collection and the strength of the PAC-Bayesian formulation.


Gaussian Process Regression for Improved Underwater Navigation

Cohen, Nadav, Klein, Itzik

arXiv.org Artificial Intelligence

Accurate underwater navigation is a challenging task due to the absence of global navigation satellite system signals and the reliance on inertial navigation systems that suffer from drift over time. Doppler velocity logs (DVLs) are typically used to mitigate this drift through velocity measurements, which are commonly estimated using a parameter estimation approach such as least squares (LS). However, LS works under the assumption of ideal conditions and does not account for sensor biases, leading to suboptimal performance. This paper proposes a data-driven alternative based on multi-output Gaussian process regression (MOGPR) to improve DVL velocity estimation. MOGPR provides velocity estimates and associated measurement covariances, enabling an adaptive integration within an error-state Extended Kalman Filter (EKF). We evaluate our proposed approach using real-world AUV data and compare it against LS and a state-of-the-art deep learning model, BeamsNet. Results demonstrate that MOGPR reduces velocity estimation errors by approximately 20% while simultaneously enhancing overall navigation accuracy, particularly in the orientation states. Additionally, the incorporation of uncertainty estimates from MOGPR enables an adaptive EKF framework, improving navigation robustness in dynamic underwater environments.


Harnessing Data Augmentation to Quantify Uncertainty in the Early Estimation of Single-Photon Source Quality

Kedziora, David Jacob, Musiał, Anna, Rudno-Rudziński, Wojciech, Gabrys, Bogdan

arXiv.org Artificial Intelligence

Novel methods for rapidly estimating single-photon source (SPS) quality have been promoted in recent literature to address the expensive and time-consuming nature of experimental validation via intensity interferometry. However, the frequent lack of uncertainty discussions and reproducible details raises concerns about their reliability. This study investigates the use of data augmentation, a machine learning technique, to supplement experimental data with bootstrapped samples and quantify the uncertainty of such estimates. Eight datasets obtained from measurements involving a single InGaAs/GaAs epitaxial quantum dot serve as a proof-of-principle example. Analysis of one of the SPS quality metrics derived from efficient histogram fitting of the synthetic samples, i.e. the probability of multi-photon emission events, reveals significant uncertainty contributed by stochastic variability in the Poisson processes that describe detection rates. Ignoring this source of error risks severe overconfidence in both early quality estimates and claims for state-of-the-art SPS devices. Additionally, this study finds that standard least-squares fitting is comparable to using a Poisson likelihood, and expanding averages show some promise for early estimation. Also, reducing background counts improves fitting accuracy but does not address the Poisson-process variability. Ultimately, data augmentation demonstrates its value in supplementing physical experiments; its benefit here is to emphasise the need for a cautious assessment of SPS quality.


Understanding Deep Neural Networks via Linear Separability of Hidden Layers

Zhang, Chao, Chen, Xinyu, Li, Wensheng, Liu, Lixue, Wu, Wei, Tao, Dacheng

arXiv.org Artificial Intelligence

In this paper, we measure the linear separability of hidden layer outputs to study the characteristics of deep neural networks. In particular, we first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets. Then, we demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance, i.e., if the updated weights can enhance the linear separability degree of hidden layer outputs, the updated network will achieve a better training performance, and vice versa. Moreover, we study the effect of activation function and network size (including width and depth) on the linear separability of hidden layers. Finally, we conduct the numerical experiments to validate our findings on some popular deep networks including multilayer perceptron (MLP), convolutional neural network (CNN), deep belief network (DBN), ResNet, VGGNet, AlexNet, vision transformer (ViT) and GoogLeNet.


DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding

Yan, Jianhao, Xu, Jin, Meng, Fandong, Zhou, Jie, Zhang, Yue

arXiv.org Artificial Intelligence

Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation. However, MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks. In this work, we show that the issue arises from the un-consistency of label smoothing on the token-level and sequence-level distributions. We demonstrate that even though label smoothing only causes a slight change in the token-level, the sequence-level distribution is highly skewed. We coin the issue \emph{autoregressive over-smoothness}. To address this issue, we propose a simple and effective method, Distributional Cooling MBR (DC-MBR), which manipulates the entropy of output distributions by tuning down the Softmax temperature. We theoretically prove the equivalence between pre-tuning label smoothing factor and distributional cooling. Extensive experiments on NMT benchmarks validate that distributional cooling improves MBR in various settings.


Calibration of Neural Networks

Vasilev, Ruslan, D'yakonov, Alexander

arXiv.org Artificial Intelligence

Neural networks solving real-world problems are often required not only to make accurate predictions but also to provide a confidence level in the forecast. The calibration of a model indicates how close the estimated confidence is to the true probability. This paper presents a survey of confidence calibration problems in the context of neural networks and provides an empirical comparison of calibration methods. We analyze problem statement, calibration definitions, and different approaches to evaluation: visualizations and scalar measures that estimate whether the model is well-calibrated. We review modern calibration techniques: based on post-processing or requiring changes in training. Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.


S2S-WTV: Seismic Data Noise Attenuation Using Weighted Total Variation Regularized Self-Supervised Learning

Xu, Zitai, Luo, Yisi, Wu, Bangyu, Meng, Deyu

arXiv.org Artificial Intelligence

Seismic data often undergoes severe noise due to environmental factors, which seriously affects subsequent applications. Traditional hand-crafted denoisers such as filters and regularizations utilize interpretable domain knowledge to design generalizable denoising techniques, while their representation capacities may be inferior to deep learning denoisers, which can learn complex and representative denoising mappings from abundant training pairs. However, due to the scarcity of high-quality training pairs, deep learning denoisers may sustain some generalization issues over various scenarios. In this work, we propose a self-supervised method that combines the capacities of deep denoiser and the generalization abilities of hand-crafted regularization for seismic data random noise attenuation. Specifically, we leverage the Self2Self (S2S) learning framework with a trace-wise masking strategy for seismic data denoising by solely using the observed noisy data. Parallelly, we suggest the weighted total variation (WTV) to further capture the horizontal local smooth structure of seismic data. Our method, dubbed as S2S-WTV, enjoys both high representation abilities brought from the self-supervised deep network and good generalization abilities of the hand-crafted WTV regularizer and the self-supervised nature. Therefore, our method can more effectively and stably remove the random noise and preserve the details and edges of the clean signal. To tackle the S2S-WTV optimization model, we introduce an alternating direction multiplier method (ADMM)-based algorithm. Extensive experiments on synthetic and field noisy seismic data demonstrate the effectiveness of our method as compared with state-of-the-art traditional and deep learning-based seismic data denoising methods.


Implementing Fair Regression In The Real World

Ruf, Boris, Detyniecki, Marcin

arXiv.org Artificial Intelligence

In a business context where an unconstrained real-world application were The potential risk of machine learning algorithms to unintentionally to be replaced with a fairer one, such extreme discrepancies embed and reproduce bias and therefore discriminating would not be viable because individuals who were substantially various sub populations in high-stakes decisionmaking negatively impacted would probably not accept applications has given rise to the new research the change and switch to a competitor. Based on our findings, field of fair machine learning (Kamiran and Calders 2009; we therefore propose algorithmic post-processing procedures Corbett-Davies et al. 2018; Barocas, Hardt, and Narayanan to adjust for unwanted, extreme discrepancies between 2019). Plenty of quantitative measures of fairness have been unconstrained and fair methods in order to enable a proposed (Dwork et al. 2011; Hardt, Price, and Srebro 2016; smooth transition from an "unfair" to a fairer model. Chouldechova 2017; Berk et al. 2018) which opened up The main contributions of this paper are: the way for three types of algorithms that seek to satisfy them: First, the pre-processing approach which modifies - We empirically examine the evolution of fair regression the data representation prior to using classical algorithms outputs compared to unconstrained predictors and demonstrate (Kamiran and Calders 2012; Zemel et al. 2013). Second, that some variations on the individual level may be the in-processing approach which intervenes during the unacceptable in practice. To the best of our knowledge we learning phase by adding a fairness constraint to the optimization offer the first investigation of this kind; objective (Kamishima et al. 2012; Zafar et al. - We propose a range of post-processing algorithms to mitigate 2017; Zhang, Lemoine, and Mitchell 2018). Third, the postprocessing this effect and therefore provide mechanisms to approach which adjusts the outputs of classical implement fair regression in practice.